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Abstract 

This paper considers the problem of detection in distributed networks in the presence of data 
falsification (Byzantine) attacks. Detection approaches considered in the paper are based on fully 
distributed consensus algorithms, where all of the nodes exchange information only with their neighbors 
in the absence of a fusion center. In such networks, we characterize the negative effect of Byzantines 
on the steady-state and transient detection performance of the conventional consensus based detection 
algorithms. To address this issue, we study the problem from the network designer’s perspective. More 
specifically, we first propose a distributed weighted average consensus algorithm that is robust to 
Byzantine attacks. We show that, under reasonable assumptions, the global test statistic for detection 
can be computed locally at each node using our proposed consensus algorithm. We exploit the statistical 
distribution of the nodes’ data to devise techniques for mitigating the influence of data falsifying 
Byzantines on the distributed detection system. Since some parameters of the statistical distribution 
of the nodes’ data might not be known a priori, we propose learning based techniques to enable an 
adaptive design of the local fusion or update rules. 
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I. Introduction 

Distributed detection is a well studied topic in the detection theory literature [|T||-[l3]|. The 
traditional distributed detection framework comprises of a group of spatially distributed nodes 
which acquire the observations regarding the phenomenon of interest and send them to the fusion 
center (FC) where a global decision is made. However, in many scenarios a centralized FC may 
not be available or in large networks, the FC can become an information bottleneck that may 
cause degradation of system performance, and may even lead to system failure. Also, due to 
the distributed nature of future communication networks, and various practical constraints, e.g., 
absence of the FC, transmit power or hardware constraints and dynamic characteristic of wireless 
medium, it may be desirable to employ alternate peer-to-peer local information exchange in order 
to reach a global decision. One such distributed approach for peer-to-peer local information 
exchange and inference is the use of a consensus algorithm [l27l . 

Recently, distributed detection based on consensus algorithms has been explored in [ffl-[|9]|. 
In consensus based detection approaches, each node communicates only with its neighbors and 
updates its local state information about the phenomenon (summary statistic) by a local fusion 
rule that employs a weighted combination of its own value and those received from its neighbors. 
Nodes continue with this consensus iteration until the whole network converges to a steady-state 
value which is the global test statistic. In particular, the authors in [[5l, [0 considered average 
consensus based distributed detection and emphasized network designs based on the small world 
phenomenon for faster convergence lEl. A bio-inspired consensus scheme was introduced for 
spectrum sensing in |[8l. However, these consensus-based fusion algorithms only ensure equal 
gain combining of local measurements. The authors in |@ proposed to use distributed weighted 
fusion algorithms for cognitive radio spectrum sensing. They showed that weighted average 
consensus based schemes outperform average consensus based schemes and achieve much better 
detection performance than the equal gain combining based schemes. However, the weighted 
average consensus based detection schemes are quite vulnerable to different types of attacks. 
One typical attack on such networks is a Byzantine attack. While Byzantine attacks (originally 
proposed in ifTOl ) may, in general, refer to many types of malicious behavior, our focus in this 
paper is on data-falsification attacks lim - lfTFI . Thus far, research on detection in the presence of 
Byzantine attacks has predominantly focused on addressing these attacks under the centralized 
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model lfT3ll . [IT4ll . [|T8ll . IIT9l . A few attempts have been made to address the seeurity threats in 
the distributed or eonsensus based sehemes in reeent researeh Il20l - ll25l . Most of these existing 
works on eountering Byzantine or data falsifieation attaeks in distributed networks rely on a 
threshold for detecting Byzantines. The main idea is to exclude nodes from neighbors list whose 
state information deviates significantly from the mean value. In [l22ll and ll25l . two different 
defense schemes against data falsification attacks for distributed consensus-based detection were 
proposed. In [|23, the scheme eliminates the state value with the largest deviation from the local 
mean at each iteration step and, therefore, it can only deal with the situation in which only one 
Byzantine node exists. It excludes one state value even if there is no Byzantine node. In [|25l . 
the vulnerability of distributed consensus-based spectrum sensing was analyzed and an outlier 
detection algorithm with an adaptive threshold was proposed. The authors in lf24ll proposed a 
Byzantine mitigation technique based on adaptive local thresholds. This scheme mitigates the 
misbehavior of Byzantine nodes and tolerates the occasional large deviation introduced by honest 
users. It adaptively reduces the corresponding coefficients so that the Byzantines will eventually 
be isolated from the network. 

Excluding the Byzantine nodes from the fusion process may not be the best strategy from the 
network designer’s perspective. As shown in our earlier work ifT^ in the context of distributed 
detection with one-bit measurements under a centralized model, an intelligent way to improve 
the performance of the network is to use the information of the identified Byzantines to the 
network’s benefit. More specifically, learning based techniques have the potential to outperform 
the existing exclusion based techniques. In this paper, we pursue such a design philosophy in 
the context of raw data based fusion in decentralized networks. 

To design methodologies for defending against Byzantine attacks, fundamental challenges 
that arise are two-fold. First, how do nodes recognize the presence of attackers? Second, after 
identification of an attacker or group of attackers, how do nodes adapt their operating parameters? 
Due to the large number of nodes and complexity of the distributed network, we develop and 
analyze schemes that would update their own operating parameters autonomously. Our approach 
further introduces an adaptive fusion based detection algorithm which supports the learning of 
the attacker’s behavior. Our scheme differs from all existing work on Byzantine mitigation based 
on exclusion strategies [l^ - [l25]l . where the only defense is to identify and exclude the attackers 
from the consensus process. 
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A. Main Contributions 

In this paper, we foeus on the suseeptibility and proteetion of eonsensus based deteetion 
algorithms. Our main eontributions are summarized as follows: 

• We eharaeterize the effeet of Byzantines on the steady-state performanee of the eonventional 
eonsensus based deteetion algorithms. More speeifieally, we quantify the minimum fraetion 
of Byzantines needed to make the defleetion eoeffieient of the global statistie equal to zero. 

• Using probability of deteetion and probability of false alarm as measures of deteetion perfor- 
manee, we investigate the degradation of transient deteetion performanee of the eonventional 
eonsensus algorithms with Byzantines. 

• We propose a robust distributed weighted average eonsensus algorithm and obtain elosed- 
form expressions for optimal weights to mitigate the effeet of data falsifieation attaeks. 

• Finally, we propose a teehnique based on the expeetation-maximization algorithm and 
maximum likelihood estimation to learn the operating parameters (or weights) of the nodes 
in the network to enable an adaptive design of the loeal fusion or update rules. 

The rest of the paper is organized as follows. In Seetions HI] and Hill we introduee our system 
model and Byzantine attaek model, respeetively. In Seetionj^ we study the seeurity performanee 
of weighted average eonsensus based deteetion sehemes. In SeotionjVl we propose a proteetion 
meehanism to mitigate the effeet of data falsifieation attaeks on eonsensus based deteetion 
sehemes. Finally, Seetion eoneludes the paper. 


II. System model 


First, we define the network model used in this paper. 

A. Network Model 

We model the network topology as an undireeted graph G = (V, E), where V = {ui, • • • , uat} 
represents the set of nodes in the network with |U| = N. The set of eommunieation links in 
the network eorrespond to the set of edges E, where {vi,Vj) G E, if and only if there is a 
eommunieation link between Vi and Vj (so that, Vi and Vj ean direetly eommunieate with eaeh 
other). The adjaeeney matrix A of the graph is defined as 



0 otherwise. 
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Figure 1. A distributed network with 6 nodes 


The neighborhood of a node i is defined as 

Mi = {Vj e V : (vi,Vj) e B},Vi G {1,2, • • • ,N}. 


The degree of a node Vi in a graph G, denoted by di, is the number of edges in E whieh inelude 
Vi as an endpoint, i.e., di = J2f=i ^ij- 

The degree matrix D is defined as a diagonal matrix with diag((ii, • • • , dAr) and the Laplaeian 
matrix L is defined as / 


di if j = i, 
—ttij otherwise. 


or, in other words, L = D — A. For example, eonsider a network with six nodes trying to reach 
consensus (see Figure [I])- The Laplaeian matrix L for this network is given by 


1 -1 
-1 3 

0 -1 
0 -1 
0 0 
0 0 


0 0 
-1 -1 
2 -1 
-1 4 

0 -1 
0 -1 


0 0 

0 0 

0 0 

-1 -1 

1 0 

0 1 


The consensus based distributed detection scheme usually contains three phases: sensing, 
information fusion, and decision making. In the sensing phase, each node acquires the summary 
statistic about the phenomenon of interest. In this paper, we adopt the energy detection method 
so that the local summary statistic is the received signal energy. Next, in the information fusion 
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phase, each node communicates with its neighbors to update their state values (summary statistic) 
and continues with the consensus iteration until the whole network converges to a steady state 
which is the global test statistic. Finally, in the decision making phase, nodes make their own 
decisions about the presence of the phenomenon. Next, we describe each of these phases in 
more detail. 


B. Sensing Phase 

We consider an A^-node network using the energy detection scheme [l26l . For the zth node, 
the sensed signal x\ at time instant k is given by 




nf, under Hq 
hiS^ + under Hi 


where hi is the channel gain, is the signal at time instant k, n\ is AWGN, i.e., n-i ~ iV(0, af) 
and independent across time. Each node i calculates a summary statistic Yi over a detection 
interval of M samples, i.e.. 




M 

k=l 


I 12 

Ki I 


where M is determined by the time-bandwidth product. Since Yi is the sum of the square of M 
i.i.d. Gaussian random variables, it can be shown that ^ follows a central chi-square distribution 
with M degrees of freedom (xm) under Hq, and, a non-central chi-square distribution with M 
degrees of freedom and parameter rji under Hi, i.e., 

Yi I Xm, under Ho 
[ xliihi) under Hi 

where rji = Es\hi\'^/af is the local SNR at the Ah node and Eg = Yl!k=i represents the 
sensed signal energy over M detection instants. Note that the local SNR is M times the average 
SNR at the output of the energy detector, which is . 


C. Information Fusion Phase 

Next, we give a brief introduction to conventional consensus algorithms [|T7I . Consensus is 
reached in two steps. 
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Step 1: All nodes establish communication links with their neighbors, and broadcast their 
information state a;j(0) = 1^. 

Step 2: Each node updates its local state information by a local fusion rule (weighted com¬ 


bination of its own value and those received from its neighbors). We denote node i’s updated 


information at iteration k by Xi{k). Node i continues to broadcast information Xi{k) and update 
its local information state until consensus is reached. This information state updating process 
can be written in a compact form as 



( 1 ) 


where e is the time step and Wi is the weight assigned to node i’s information. Using the notation 
x{k) = [xi{k), ■ ■ ■ ,XAr(A;)]^, network dynamics in the matrix form can be represented as, 


x{k + 1) = Wx{k) 


where, W = I — e diag(l/tfi, • • • , 1/wn)L is referred to as a Perron matrix. The consensus 
algorithm is nothing but a local fusion or update rule that fuses the nodes’ local information 
state with information coming from neighbor nodes and every node asymptotically reaches the 
same information state for arbitrary initial values. 

D. Decision Making Phase 

The final information state after reaching consensus for the above consensus algorithm will be 
the weighted average of the initial states of all the nodes lf27l or = E*=i 
Average consensus can be seen as a special case of weighted average consensus with Wi = w, \/i. 
After the whole network reaches a consensus, each node makes its own decision about the 
hypothesis using a predefined threshold )|i] 


Decision 



Ho otherwise 


'in practice, parameters such as threshold A and consensus time step e can be set off-line. This study is beyond the scope of 
this work. 
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( 2 ) 


Wi = 




Note that, after reaching consensus x* = A, Vi Thus, in rest the of the paper, A is referred to 
as the final test statistic. 

Next, we discuss Byzantine attacks on consensus based detection schemes and analyze the 
performance degradation of the weighted average consensus based detection algorithm due to 
these attacks. 


III. Attacks on Consensus based Detection Algorithms 

When there are no adversaries in the network, we noted in the last section that consensus can 
be accomplished to the weighted average of arbitrary initial values by having the nodes use the 
update strategy x{k + l) = Wx{k) with an appropriate weight matrix W. Suppose, however, that 
instead of broadcasting the true sensing statistic YJ and applying the update strategy ([I]), some 
nodes (referred to as Byzantines) deviate from the prescribed strategies. Accordingly, Byzantines 
can attack in two ways: data falsification (nodes falsify their initial data or weight values) and 
consensus disruption (nodes do not follow update rule given by ([I])). More specifically, Byzantine 
node i can do the following 

Data falsification: a;j(0) = Y) + A,, or w* —)■ Wi 

Consensus disruption: Xi{k + 1) = Xi{k) + — E {xj{k) - Xi{k)) + Ui{k), 

where {Ai,Wi) and Ui{k) are introduced at the initialization step and at the update step k, 
respectively. The attack model considered above is extremely general, and allows Byzantine node 
i to update its value in a completely arbitrary manner (via appropriate choices of {Ai,Wi), and 
Ui{k), at each time step). An adversary performing consensus disruption attack has the objective 
to disrupt the consensus operation. However, consensus disruption attacks can be easily detected 
because of the nature of the attack. The identification of consensus disruption attackers has been 
investigated in the past literature (e.g., see (Tl, (281) where control theoretic techniques were 
developed to identify disruption attackers in a single consensus iteration. Knowing the existence 
of such an identification mechanism, a smart adversary will aim to disguise itself while degrading 
the detection performance. In contrast to disruption attackers, data falsification attackers are more 
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capable and can manage to disguise themselves while degrading the detection performance of 
the network by falsifying their data. Susceptibility and protection of consensus strategies to data 
falsification attacks has received scant attention, and this is the focus of our work. In this paper, 
we assume that an attacker performs only a data falsification attack by introducing (A,, Wi) during 
initialization. We exploit the statistical distribution of the initial values and devise techniques to 
mitigate the influence of Byzantines on the distributed detection system. 

A. Data Falsification Attack 

In data falsification attacks, attackers try to manipulate the final test statistic (i.e., A = 

^ way that the detection performance is degraded. We consider a 
network with N nodes that uses Algorithm ([1]) for reaching consensus. Algorithm ([1]) can be 
interpreted as, weight Wi, given to node f s data Yi in the final test statistic, is assigned by node 
i itself. So by falsifying initial values or weights Wi, the attackers can manipulate the final 
test statistic. Detection performance will be degraded because Byzantine nodes can always set a 
higher weight to their manipulated information. Thus, the final statistic’s value across the whole 
network will be dominated by the Byzantine node’s local statistic that will lead to degraded 
detection performance. 

Next, we define a mathematical model for data falsification attackers. We analyze the degra¬ 
dation in detection performance of the network when Byzantines falsify their initial values Y^ 
for fixed arbitrary weights Wi. 

B. Attack Model 

The objective of Byzantines is to degrade the detection performance of the network by 
falsifying their data (Yi,Wi). By assuming that Byzantines are intelligent and know the true 
hypothesis, we analyze the worst case detection performance of the data fusion schemes. We 
consider the case when weights of the Byzantines have already been tampered to Wi and analyze 
the effect of falsifying the initial values 1^. This analysis provides the most favorable case 
from the point of view of Byzantines and yields the maximum performance degradation that the 
Byzantines can cause. Now a mathematical model for a Byzantine attack is presented. Byzantines 
tamper with their initial values Yi and send Yi such that the detection performance is degraded. 
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Under Hq: 



+ Aj with probability Pi 
Yi with probability {1 — Pi) 


Under Hi: 



Yi — Ai with probability Pi 

Yi with probability (1 — Pi) 


where Pi is the attaek probability and A* is a eonstant value whieh represents the attaek 
strength, whieh is zero for honest nodes. As we show later, Byzantine nodes will use a large value 
of Aj so that the final statistie’s value is dominated by the Byzantine node’s loeal statistie that will 


lead to a degraded deteetion performanee. We use defleetion eoeffieient [|29ll to eharaeterize the 


seeurity performanee of the deteetion seheme due to its simplieity and its strong relationship with 
the global deteetion performanee. Defleetion eoeffieient of the global test statistie is defined as: 


/ \2 

'D(A) = ^ —, where /Xfc = E[A|i7fc] is the eonditional mean and = E[{A — fik)‘^\Hk] is 

.. . ^( 0 ) . . . „ . 


the eonditional varianee. The defleetion eoeffieient is also elosely related to other performanee 
measures, e.g., the Reeeiver Operating Charaeteristies (ROC) eurve. In general, the deteetion 
performanee monotonieally inereases with an inereasing value of the defleetion eoeffieient. We 
define the eritieal point of the distributed deteetion network as the minimum fraetion of Byzantine 
nodes needed to make the defleetion eoeffieient of global test statistie equal to zero (in whieh 
ease, we say that the network beeomes blind) and denote it by auind- We assume that the 
eommunieation between nodes is error-free and our network topology is fixed during the whole 
eonsensus proeess and, therefore, eonsensus ean be reaehed without disruption. 

In the next seetion, we analyze the seeurity performanee of eonsensus based deteetion sehemes 
in the presenee of data falsifying Byzantines. 

IV. Performance analysis of consensus based detection algorithms 

In this seetion, we analyze the effeet of data falsifieation attaeks on eonventional eonsensus 
based deteetion algorithms. 

First, in Seetion HV-A[ we eharaeterize the effeet of Byzantines on the steady-state performanee 
of the eonsensus based deteetion algorithms and determine auind- Next, in Seetion IIV-BI using 
probability of deteetion and probability of false alarm as measures of deteetion performanee, we 
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Ni 

/iO = 

i=l *- 


Pi- 


{Ma^ + A,) + (1 - 


sumw 


sumUd 


N 

E 

i=Ali + l '- 


Wj 


sumw 


■{Ma, 


(3) 


Ni r 




2=1 


P- 


sumw 


-((M + - A,) + (1 - 


sumUd 


N 


E 

i=Ari+l L 


W,- 


sumw 




(4) 


O': 


(0) 


Ml 

E 

i=l 


Wi 


sumw 


P,(l - Pi)A2 + 2Maf] + 


N 

E 

i=Mi+l 


to,- 


sum w 


[2Mat] (5) 


investigate the degradation of transient deteetion performanee of the eonsensus algorithms with 
Byzantines. 

A. Steady-State Performance Analysis with Byzantines 

Without loss of generality, we assume that the nodes eorresponding to the first Ni indiees 
i = 1, • • • , Ai are Byzantines and the rest eorresponding to indiees i = Ai + 1, • • • , A are honest 
nodes. Let us define w = [tui, • • • , wat^, tUAr^+i • • • ,wn]'^ and sum(t(;) = 

Lemma 1. For data fusion schemes, the condition to blind the network or to make the deflection 
coefficient zero is given by 

Ml M 

'^Wi{2Pi/Si - ^ 

i=l i=Mi + l 

Proof: The loeal test statistie Ej has the mean 

f Maj if Po 

mearii = < 

\ + if Pi 

and the varianee 
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Figure 2. Deflection Coefficient as a function of attack parameters P and D. 


V aVi = 


2 Maf if Ho 
2 {M + 2 rii)af if 


The goal of Byzantine nodes is to make the defleetion eoeffieient as small as possible. 

Sinee the Defleetion Coefficient is always non-negative; the Byzantines seek to make ViA) = 

( 

—— = 0. The conditional mean = E[A|iirfc] and conditional variance = E[(A — 


a. 


(0) 


^oY\Ho] of the global test statistic, A = com¬ 

puted and are given by ([3]), dH) and ([5]), respectively. After substituting values from (jS]), ([4]) and 
dS]), the condition to make 'D(A) = 0 becomes 

Afi N 

'^Wi{2Pi/Ai - ^ WiTjia'^ 

j=l i=A''i-|-l 


Note that, when Wi = Wi = z,rii = ri,ai = a, Pi = P, Aj = D, Vi, the blinding condition 
Ni 1 ria^ 

simplifies to — = condition indicates that by appropriately choosing attack 

parameters {P,D), an adversary needs less than 50% of sensing data falsifying Byzantines 
to make the deflection coefficient zero. 

Next, to gain insights into the solution, we present some numerical results in Figure [2l We 
plot the deflection coefficient of global test statistic as a function of attack parameters Pj = 
P, Aj = D,Wi. We consider a 6-node network with the topology given by the undirected graph 
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shown in Figure [U to detect a phenomenon. Nodes 1 and 2 are considered to be Byzantines. 
Channel gains of the nodes are assumed to he h = [0.8,0.7,0.72,0.61,0.69,0.9] and weights 
are given by Q. We also assume that M = 12, Eg = 5 and af = 1. Notice that, the deflection 
coefficient is zero when the condition in Lemma [T] is satisfied. Another observation to make 
is that the deflection coefficient can be made zero even when only two out of six nodes are 
Byzantines. Thus, by appropriately choosing attack parameters {P,D), less than 50% of data 
falsifying Byzantines are needed to blind the network. 

B. Transient Performance Analysis with Byzantines 

Next, we analyze the detection performance of the data fusion schemes, denoted as x{t + l) = 
1L%(0), as a function of consensus iteration t in the presence of Byzantines. For analytical 
tractability, we assume that Pi = P, Vi We denote by te*- the element of matrix in the jth 
row and ith column. Using these notations, we calculate the probability of detection and the 
probability of false alarm at the jth node at consensus iteration t. 

For sufficiently large M, the distribution of Byzantine’s data Yi given 77^ is a Gaussian mixture 
which comes from AA((/iifc)i, (o'ik)f) with probability (1 — P) and from (<^ 2 k)i) with 

probability P, where Af denotes the normal distribution and 

(/^io)i = {p2o)i = Alaf + Aj 

= {M + (/i2i)i = {M + — Aj 

= (f^ 2 o)i = 2Maf, and (an)^ = (aai) • = 2(M + rii)af. 

Now, the probability density function (PDF) of xV = wjjki conditioned on 77^ can be derived 
as 

fix]i\Hk) = (1 - P)(j)iw%itiik)u {w%iaik)if) 

+ P<l>{'w%{p2k)i, {w%{a2k)if) ( 6 ) 

where 0(x|/i, cx^) (for notational convenience denoted as ^(/r, a^)) is the PDF of A ~ AA(/r, cx^) 
and (j){x\ti, a'^) = ■ Next, for clarity of exposition, we first derive our results 

for a small network with two Byzantine nodes and one honest node. Later we generalize our 
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results for an arbitrary number of nodes, N. 

Notiee that, for the three node ease, the transient test statistie A* = +^* 3 ^ 3 , is a 

summation of independent random variables. The eonditional PDF of Xj- = is given in ®. 
Notiee that, PDF of A* is the eonvolution (*) of /(a:* J = (1 — P)0(/i|, (crj)^) +P0(/i^, (cr^)^), 
/^•s) = (1 - and f{x)^) = (^3^)2). 

/(z*) = f{x]^) * f{x%) * f{x%) 

fi4) = [(1 “ (f^i)^)]* 

[(1 - (ct 2 ^)^) + {alf)] * 0(/i3, (a^)^) 

= (1 - P)^[0(/ii, * 0(/il, (^2^)2) * 0(/il, (^3^)2)] 

+P(1 - P)[0(/ii, {alf) * 0(/i^, (^2^)2) * 0(/i^, (cr3^)2)] 

+ (1 - P)P[0(/ri, (ai^)2) * 0(/i2, (^2)2) * 0(^1, (cri)2)] 

Now, using the faet that eonvolution of two normal PDFs (p{fj,i,af) and 4>{^j,a‘j) is again 
normally distributed with mean (/ij + ^j) and varianee (erf + erf), we ean derive the results 
below. 

/(^i) = (1 - + A2 + As, + (^2)^ + (f^s)^)] 

+PWi + + aL + (^ 2 ^)^ + (^ 3 ^)^)] 

+ P(1 - P)[0(/if + /i2 + aL + (^^ 2 )^ + (<^ 3 )^)] 

+ (1 - P)P[0(/i} + /if + /ig, (cr|)2 + ((7^)2 + (crf)2)]. 

Due to the probabilistie nature of the Byzantine’s behavior, it may behave as an honest node 
with a probability (1 — Pj). Let S denote the set of all eombinations of sueh Byzantine strategies: 

S = {{&!, ^2}, {^1, ^2}, {^1, ^2}, {^1, ^2}} (7) 
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where by 6, we mean that Byzantine node i behaves as a Byzantine and by hi we mean that 
Byzantine node i behaves as an honest node. Let Ag G U denote the indiees of honest nodes in 
the strategy eombination s, then, from ([7]) we have 


U — {Ai — {}, A 2 — {!}, A^ — { 2 }, A 4 — { 1 , 2 }} 

= { 1 , 2 }, = { 2 }, = {!}, = {}} 


where {} is used to denote the null set and to denote the eardinality of subset Ag G U. Using 
these notations, we generalize our results for any arbitrary N. 


Lemma 2. The test statistic of node j at consensus iteration t, i.e., A* = + 

a Gaussian mixture with PDF 


N 


N 


As&U \ i=Nx+l i=l 

with {fik)Ag = E W^ju{hlk)j + E W^juih2k)j- 
ueAa UGA^ 

The performanee of the deteetion seheme in the presenee of Byzantines ean be represented 
in terms of the probability of deteetion and the probability of false alarm of the network. 

Proposition 1. The probability of detection and the probability of false alarm of node j at 
consensus iteration t in the presence of Byzantines can be represented as 

Pd{j)= E -Ei^iVi+iWjd Pii)A 

\ VEf=iKd<^ii)0^) / 


PfU)^ E 

AsGU 


A (mo)As X^i^ATj+l 


Remark 1. Notice that, the expressions of probability of detection P\{j) and probability of false 
alarm Pj{j) for the Ni Byzantine node case involves 2^^ combinations (cardinality ofU is 2^^). 
It, however, can be represented compactly by vectorizing the expressions, i.e.. 


Pj(j) = l^ b®Q 


AI-/X 1 
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N 

with = [Aw*.^j,ii +A^w*.fj, 2 i- Y1 B = {1-P)A +PA^'and h = [B''- 

i=Ni +1 

B^^], where boldface letters represent vectors, ® symbol represents element-wise multiplication, 
Q(-) represents element wise Q function operation, i.e., Q{xi, • • • , xjsif) = [Q(a^i), • • • , Q{,XNf)Y, 
B‘ is ith column of matrix B, WjPui = ''' ^'^jNiduiV> fnatrix ^( 2 JVi* 7 Vi) ^^e binary 

representation of decimal numbers from 0 to iVi — 1 and A‘^ is the matrix after interchanging 1 
and 0 in matrix A. 

Similarly, the expression for the probability of false alarm Pj{j) can be vectorized into a 
compact form. 


Next, to gain insights into the results given in Proposition [U we present some numerical 
results in Figures |3] and IH We consider the 6 -node network shown in Figure [T| where the nodes 
employ the consensus algorithm [T] with e = 0.6897 to detect a phenomenon. Nodes 1 and 2 
are considered to be Byzantines. We also assume that pi = 10, af = 2, A = 33 and Wi = 1. 
Attack parameters are assumed to be (Pi,Ai) = (0.5,6) and Wi = 1.1. To characterize the 
transient performance of the weighted average consensus algorithm, in Figure |3(a)[ we plot the 
probability of detection as a function of the number of consensus iterations when Byzantines are 
not falsifying their data, i.e., (A* = 0,t7j = Wi). Next, in Figure [3(b)| we plot the probability of 
detection as a function of the number of consensus iterations in the presence of Byzantines. It can 
be seen that the detection performance degrades in the presence of Byzantines. In Figure |4(a)[ 
we plot the probability of false alarm as a function of the number of consensus iterations when 


Byzantines are not falsifying their data, i.e., (Aj = = wf). Next, in Figure [4{b)| we plot 

the probability of false alarm as a function of the number of consensus iterations in the presence 
of Byzantines. From both Figures [3] and IH it can be seen that the Byzantine attack can severely 
degrade transient detection performance. 

From the discussion in this section, we can see that Byzantines can severely degrade both the 
steady-state and the transient detection performance of conventional consensus based detection 
algorithms. As mentioned earlier, a data falsifying Byzantine i can tamper its weight Wi as well 
as its sensing data F) to degrade detection performance. One approach to mitigate the effect of 
sensing data falsification is to assign weights based on the quality of the data. In other words, 
lower weight is assigned to the data of the node identified as a Byzantine. However, to implement 
this approach one has to address the following two issues. 
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(a) (b) 

Figure 3. |(a)| Probability of detection as a function of consensus iteration steps. [(b)| Probability of detection as a function of 

consensus iteration steps with Byzantines. 




Figure 4. |(a)| Probability of false alarm as a function of consensus iteration steps. [(b)| Probability of false alarm as a function 
of consensus iteration steps with Byzantines. 


First, in the conventional weighted average consensus algorithm, weight Wi given to node i’s 
data is assigned by the node itself. Thus, a Byzantine node can always set a higher weight to its 
manipulated information and the final statistics will be dominated by the Byzantine nodes’ local 
statistic that will lead to degraded detection performance. It will be impossible for any algorithm 
to detect this type of malicious behavior, since any weight that a Byzantine chooses for itself 
is a legitimate value that could also have been chosen by a node that is functioning correctly. 
Thus, the conventional consensus algorithm cannot be used in the presence of an attacker. 

Second, as will be seen later, the optimal weights assigned to nodes’ sensing data depend 
on the following unknown parameters: identity of the nodes (i.e., honest or Byzantine) and 
underlying statistical distribution of the nodes’ data. 

In the next section, we address these concerns by proposing a learning based robust weighted 
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average consensus algorithm. 

V. A Robust Consensus Based Detection Algorithm 

In order to address the first issue, we propose a consensus algorithm in which the weight for 
node z’s information is assigned (or updated) by neighbors of the node i rather than by node i 
itself. Note that, networks deploying such an algorithm is more robust to weight manipulation 
because if a Byzantine node j wants to lower the weight assigned to the data of its honest 
neighbor i in the global test statistic, it has to make sure that a majority of the neighbors of i 
put the same lower weight as j. In other words, every honest node should have majority of its 
neighbors that are Byzantines, otherwise, it can be treated as a consensus disruption attack and 
Byzantines can be easily identified detected by techniques such as those given in [[7l, If^ . 

A. Distributed Algorithm for Weighted Average Consensus 

In this section, we address the following questions: does there exist a distributed algorithm 
that solves the weighted average consensus problem while satisfying the condition that weights 
must be assigned or updated by neighbors A/) of the node i rather than by the node i itself? If 
it exists, then, under what conditions or constraints does the algorithm converge? 

We consider a network with N nodes with a fixed and connected topology G{V,E). Next, 
we state Perron-Frobenius theorem [|30l, which will be used later for the design and analysis of 
our robust weighted average consensus algorithm. 

Theorem 1 ( [l30l l. Let W be a primitive nonnegative matrix with left and right eigenvectors u 
and V, respectively, satisfying Wv = v and vFW = u^. Then, lim^^oo 

Using the above theorem, we take a reverse-engineering approach to design a modified Perron 
matrix W which has the weight vector w = [wi,W 2 , - ■ ■ ,wnY,W i > 0, Vf as its left eigenvector 
and 1 as its right eigenvector corresponding to eigenvalue 1. From the above theorem, if the 
modified Perron matrix W is primitive and nonnegative, then, a weighted average consensus can 
be achieved. Now, the problem boils down to designing such a W which meets our requirement 
that weights are assigned or updated by the neighbors A/) of node i rather than by node i itself. 
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Next, we propose a modified Perron matrix W = I — €(T 0 L) where L is the original graph 
Laplaeian, ® is element-wise matrix multiplieation operator, and T is a transformation given by 


[T] 




E ^3 

jej^i 


if i=j 


I Wj otherwise. 

Observe that, the above transformation T satisfies the eondition that weights are assigned or 
updated by neighbors Mi of node i rather than by node i itself. Based on the above transformation 
T, we propose our distributed eonsensus algorithm: 


Xiik -f 1 ) = Xi{k) + e ^ Wj{xj{k) — Xi{k)). 
jeJ^i 

Let us denote the modified Perron matrix hy W = I — eL. 

Next, we explore the properties of the modified Perron matrix W and show that it satisfies 
the requirements of the Perron-Frobenius theorem [|3^ . These properties will later be utilized 
to prove the eonvergenee of our proposed eonsensus algorithm. 

Lemma 3. Let G be a connected graph with N nodes. Then, the modified Perron matrix W = 

I — e(T 0 L), with 0 < e <-- satisfies the following properties. 

max( y Wj) 

1) W is a nonnegative matrix with left eigenvector w and right eigenvector 1 corresponding 
to eigenvalue 1; 

2) All eigenvalues of W are in a unit circle; 

3) W is a primitive matri^ 

Proof: Notiee that, ll^l = 1 — e(T ® L)1 = 1 and w'^W = nF — ew'^{T 0 L) = w'^. This 
implies that W has left eigenveetor w and right eigenveetor 1 eorresponding to eigenvalue 1. To 
show that W = I + eT 0 A — eT 0 D is non-negative, it is suffieient to show that: w > 0, e > 0 
and e(maxj( wf)) < 1, VL Sinee w is the left eigenveetor of L and te > 0, PF is non-negative 
if and only if 

0 < e < -— 7 - 

— maxi( Wj) 

jeMi 


matrix is primitive if it is non-negative and its mth power is positive for some natural number m. 
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To prove part 2), notice that all the eigenvectors of W and L are the same. Let be the jth 
eigenvalue of L, then, the jth eigenvalue of PL is Aj = 1 — ejj. Now, part 2) can be proved by 
applying Gershgorin theorem If30l to the modified Laplacian matrix L. 

To prove part 3), note that G is strongly connected and, therefore, IV is an irreducible 
matrix ll30l . Thus, to prove that 14^ is a primitive matrix, it is sufficienj^ to show that W 
has a single eigenvalue with maximum modulus of 1. In f[27ll . the authors showed that when 
0 < e < max( > Oi, ), the original Perron matrix W has only one eigenvalue with maximum 

modulus 1 at its spectral radius. Using a similar logic, lU is a primitive matrix if 


0 < e < 


max 

i 






Theorem 2. Consider a network with fixed and strongly connected undirected topology G(U, E) 
that employs the distributed consensus algorithm 


where 


Xi{k + 1) = xfik) + eY, ~ 

j&Ni 


Then, consensus with x* 


0 < e < 


_ 1 _ 

max( 

isM 




, Vz is reached asymptotically. 


Proof: A consensus is reached asymptotically, if the limit lim lU^ exists. According to 

k^oo 

Perron-Frobenius theorem lf30ll . this limit exists for primitive matrices. Note that, 1 = [1, • • • , 1]^ 
and w are right and left eigenvectors of the primitive nonnegative matrix W respectively. Thus, 


^An irreducible stochastic matrix is primitive if it has only one eigenvalue with maximum modulus. 
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15 20 

Iteration Steps 


Figure 5. Convergence of the network with a 6-nodes (e = 0.3). 


from OOl 


lim x{k) = lim {W)^x{fd) 

k^oo k^oo 


X 


= 1 


x* = 1 


w'^x{0) 

w '^1 

E n 

i=iWi 


Next, to gain insights into the eonvergenee property of the proposed algorithm, we present 
some numerieal results in Figure [5l We eonsider the 6-node network shown in Figure [T] where 
the nodes employ the proposed algorithm (with e = 0.3) to reaeh a eonsensus. Next, we plot the 
updated state values at eaeh node as a funetion of eonsensus iterations. We assume that the initial 
data veetor is x(0) = [5, 2, 7, 9, 8,1]^ and the weight veetor is to = [0.65, 0.55, 0.48, 0.95, 0.93, 0.90]^. 
Note that, the parameter e satisfies the eondition mentioned in Theorem [2l Figure [5] shows the 
eonvergenee of the proposed algorithm iterations for a fixed eommunieation graph. It is observed 
that within 20 iterations eonsensus has been reaehed on the global deeision statisties or weighted 
average of the initial values (states). 

In the proposed eonsensus algorithm, weights assigned to node Fs data are updated by 
neighbors of the node i rather than by node i itself whieh addresses the first issue. 
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B. Adaptive Design of the Update Rules based on Learning of Nodes’ Behavior 

Next, to address the seeond issue, we exploit the statistieal distribution of the sensing data and 
devise teehniques to mitigate the influenee of Byzantines on the distributed deteetion system. 
We propose a three-tier mitigation seheme where the following three steps are performed at eaeh 
node: 1) identifieation of Byzantine neighbors, 2) estimation of parameters of identified Byzan¬ 
tine neighbors, and 3) adaptation of eonsensus algorithm (or update weights) using estimated 
parameters. 

We first present the design of distributed optimal weights for the honest/Byzantine nodes 
assuming that the identities of the nodes are known. Later we will explain how the identity of 
nodes (i.e., honest/Byzantine) ean be determined. 

1) Design of Distributed Optimal Weights in the Presence of Byzantines: In this subseetion, 
we derive elosed form expressions for the distributed optimal weights whieh maximize the 
defleetion eoeffieient. First, we eonsider the global test statistie A = ' I ^^i+i ^ 

and obtain a elosed form solution for optimal eentralized weights. Then, we extend our anal¬ 
ysis to the distributed seenario. Let us denote by Sf, the eentralized weight given to the 
Byzantine node and by df, the eentralized weight given to the Honest node. By eonsidering 

/ Ni N \ /Ni N \ 

= '^F/ [Y^'^F + Y "^F ) = wf/ + Y '^F ) > the optimal weight 

design problem ean be stated formally as: 


Ni 

St. 

i=l 


max 


(Ai - dof 


a: 


(0) 


N 


E ^." = 1 


2 = A^1 + 1 


where pi, po and are given as in ([3]), dH) and (|5]), respeetively. The solution of the above 
problem is presented in the next lemma. 
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Figure 6. ROC for different protection approaches 


Lemma 4. Optimal centralized weights which maximize the deflection coefficient are given as 


sF = 


w; 


N-i N 

'Ewf + E 


2 = 1 


= 


w. 


i=Ni+l 

H 


Ni N 

+ E 

2 = 1 2 = A^1+1 


where wP = 


ffiaf - 2PiAi) 


and wp = 


Vi 


A^Pfll-Pi) + 2Maf * 2Mal 

Proof: The above results can be obtained by equating the derivative of the deflection 
coefficient to zero. ■ 


Remark 2. Distributed optimal weights can be chosen as wf and wp. Thus, the value of the 
global test statistic (or final weighted average consensus) is the same as the optimal centralized 
weighted combining. 

Next, to gain insights into the solution, we present some numerical results in Figure [6] that 
corroborate our theoretical results. We assume that M = 12, rji = 3, = 0.5 and the attack 


'’Note that, weights wf can he negative and in that case convergence of the proposed algorithm is not guaranteed. However, 
this situation can he dealt off-line hy adding a constant value to make wf > 0 and changing the threshold A accordingly. More 
specifically, hy choosing a constant c such that (^f + X <— X + /3c where /3 is number of nodes with 

wf < 0. 
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parameters are {Pi, Ai) = (0.5,9). In Figure we eompare our proposed weighted average 
eonsensus based deteetion seheme with the equal gain eombining sehemeO and the seheme 
where Byzantines are eut off or removed from the fusion proeess. 

It ean be elearly seen from the figure that our proposed seheme performs better than the rest 
of the sehemes. 

Notiee that, the optimal weights for the Byzantines are funetions of the attaek parameters 
{Pi, Ai), whieh may not be known to the neighboring nodes in praetiee. In addition, the 
parameters of the honest nodes might also not be known. Thus, we propose a teehnique to 
learn or estimate these parameters. We then use these estimates to adaptively design the loeal 
fusion rule whieh are updated after eaeh learning iteration. 

2) Identification, Estimation, and Adaptive Fusion Rule: The first step at eaeh node m is to 
determine the identity (P G {H, B}) of its neighboring nodes i G Afm- Notiee that, if node i is 
an honest node, its data under hypothesis PIk is normally distributed N'{{pik)i, {o'ik)i)- On the 
other hand, if node i is a Byzantine node, its data under hypothesis Hk is a Gaussian mixture 
whieh eomes from U{{pik)i, {(^ik)}) with probability {a\ = 1 - Pi) and from U{{p2k)i, {(^2k)‘i) 
with probability = Pi- Therefore, determining the identity (P G {H, B}) of neighboring 
nodes i G Mm can be posed as a hypothesis testing problem: 

Iq {P = H) : Yi is generated from a Gaussian distribution under eaeh hypothesis PIk', 

Ii {P = B) : Yi is generated from a Gaussian mixture distribution under eaeh hypothesis Hk- 

Node elassifieation ean then be aehieved using the maximum likelihood deeision rule: 


fiYilP) ^ 

B 


? f{Yi\h) 


( 8 ) 


where f{-\ Ii) is the probability density funetion (PDF) under eaeh hypothesis Ii. However, 
the parameters of the distribution are not known. Next, we propose a teehnique to learn these 
parameters. For an honest node i, the parameters to be estimated are {{pik)i, (o’lfc)?) and for 
Byzantines the unknown parameter set to be estimated is 6^ = {afi {pjk)i, {o'jk)i}, where k = 
{0,1}, j = {1,2} and i = 1, - ■■ , Nm, for Nm neighbor nodes. These parameters are estimated 
by observing the data over multiple learning iterations. In eaeh iteration t, every node in the 


^In equal gain combining scheme, all the nodes (including Byzantines) are assigned the same weight. 
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t £>i(t+l) 

(E z?,(r))[(aio)fW + ((Aio).(t + i)-(Aio).(t))^]+ E [y?(rf)-(Aio).(t + i)F 

{a,o)Ut + i) = — -- (9) 

E D,ir) 


{d-ii)i{t + 1) 


t 

E 


{D - D,{r))[{an)Ut) + ((Aii).(i + 1) - (Aii).W)^] + 


D — Di (t+ 1 ) 

E [yl{d)-{finUt + ir 




t +1 


E(^-^iW) 


( 10 ) 


network observes the data coming from their neighbors for D detection intervals to learn their 
respective parameters. It is assumed that each node has the knowledge of the true hypothesis 
for D detection intervals (or history) through a feedback mechanism. 

First, we explain how the unknown parameter set for the distribution under the null hypothesis 
(Jo) can be estimated. Let us denote the data coming from an honest neighbor node i asYi{t) = 
[r/°(l), • • • ,y^{Di{t)),y}{Di{t) + 1), • • • ,y}{D)] where Di{t) denotes the number of times Ho 
occurred in learning iteration t and y^ denotes the data of node i when the true hypothesis 
was Hk- To estimate the parameter set, io'ik)i), of an honest neighboring node, one can 

employ a maximum likelihood based estimator (MLE). We use ((/iifc)j(f), (difc)^(f)) to denote the 
estimates at learning iteration t, where each learning iteration consists of D detection intervals. 
The ML estimate of {{yik)i, io'ik)i) can be written in a recursive form as following: 


(Ai„)i(i + 1) = g- + - E 

Y.Di(r) EAM ■'-1 

r=l r=l 

E(0-Ci(r)) D 

(Ai.).(* + 1) = + - E <12) 

E(D-BiM) UD- Di{r)) d=Di{t+i) 

r=l r=l 

where expressions for (cr'io)? and are given in dH) and (fT^ . Observe that, by writing 

these expressions in a recursive manner, we need to store only D data samples at any given 
learning iteration t, but effectively use all tD data samples to determine the estimates. 
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Next, we explain how the unknown parameter set for the distribution under the alternate hy¬ 
pothesis (Ji) can be estimated. Since the data is distributed as a Gaussian mixture, we employ the 
expectation-maximization (EM) algorithm to estimate the unknown parameter set for Byzantines. 

Let us denote the data coming from a Byzantine neighbor i as Yj(t) = + 

1), • • • , yl{D)] where Di{t) denotes the number of times Hq occurred in learning iteration t and 
y^ denotes the data of node i when the true hypothesis was H^. Let us denote the hidden variable 
as Zj with j = {1, 2} or (Z = [zi, Z 2 \). Now, the joint conditional PDL of yf and Zj, given the 
parameter set, can be calculated to be 


P{y'l{d),z,\e) = P{z,\y^{d),e)P{y^mN^)rA^,kt) 

In the expectation step of EM, we compute the expectation of the log-likelihood function with 
respect to the hidden variables Zj, given the measurements Yj, and the current estimate of the 
parameter set OK This is given by 


Q{e,e^) = E[\ogP{Y,,z\e)\Y,,e^] 

2 Diit) 

= log[a}P(y°((i)|(/iio)i, {(TjQ)l)P{zj\y'^i{d), 9^)] 

J=1 d=l 
2 D 

J —1 rf=Z)i(t)+l 

where 


P{z,md),e^) 


E <{l)Pi.y^i.d)\{ynk)i{l), (cr„fc)?(/)) 

n=l 


(13) 


In the maximization step of EM algorithm, we maximize Q{9^ 9^) with respect to the parameter 
set 9 so as to compute the next parameter set: 


6**^^=argmax Q{9,9''). 
e 
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First, we maximize Q{9, 6^) subject to the constraint ^ ct, = 1 ) • We define the Lagrangian 


C as 




£ = 0 ( 9 , 9 ')+ 

i=i 


Now, we equate the derivative of C to zero: 


_d_ 

da'‘. 


C = \ + 


Di(t) D 

E p{zj\yKd),e^) E PizMid),e^) 

d=Diit)+l 


d=l 


+ 


= 0 . 


a,- 




J 3 3 

Multiplying both sides by a* and summing over j gives A = —D. Similarly, we equate the 
derivative of Q{6,9^) with respect to {yjk)i and {o'k)‘i to zero. Now, an iterative algorithm for 
all the parameters is 


odj{l + 1 ) 


1 

E 


Di{t) D 

^P(2,|9»(9),9')+ Y. 


d=l 


d—Di (t)+l 


+ 1 ) 


+ 1) 


+ 1 ) 


{<y3dl{i + 1 ) 


Di{t) 

E p{z,md).o^md) 

d=l _ 

Di{t) 

E P{z,md).9^) 

d=l 

E Pi.z3\yli.d)^d^)yli.d) 

d=D\{t)+l 

E P(z,\vi(d),e‘) 

d — Di (t)+l 
2 E>i(t) 

E E P{z,md),0^)md)-iy3oUi + i)? 

7=1 d=l _ 

2 Di{t) 

E E P{z3md),d^) 

7 = 1 d=l 

E E P{z,\y;(d),e‘)(y}(d)-(iz,Ml + i)r 

j —1 d—Di (^)+l 

i E P(zi\vm,e‘) 

7 = 1 d=D-i_{t)+l 


(14) 

(15) 

(16) 

(17) 

(18) 


In the learning iteration t, let the estimates after the convergence of the above algorithm be 
denoted by 9{t) = {d* (f), ildjk)i{t), {d'jk)j{t)}. These estimates are then used as the initial values 
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Figure 7. ROC for different learning iterations 


for the next learning iteration t + 1 that uses a new set of D data samples. 

After learning the unknown parameter set under Jq and Ji, node elassifieation ean be aehieved 
using the following maximum likelihood deeision rule: 


/(Y,| Jo) I /(Y,| h) (19) 

B 

where /(■) is the PDF based on estimated parameters. 

Using the above estimates and node elassifieation, the optimal distributed weights for honest 
nodes after learning iteration t ean be written as 




( 20 ) 


Similarly, the optimal distributed weights for Byzantines after learning iteration t ean be 
written as 


__ 

((/iio(t))i - + (di(^) (<^io)' it) + aiit) (<j2o)- it)) 


( 21 ) 


Next, we present some numerical results in Figure |7] to evaluate the performance of our 
proposed scheme. Consider the scenario where 6 nodes organized in an undirected graph (as 
shown in Figure [B are trying to detect a phenomenon. Node 1 and node 2 are considered 
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to be Byzantines. We assume that ((/iio)i, (o'io)D = (3,1.5), ((/Wn),, (cth)^) = (4,2) and the 
attaek parameters are (Pj, Aj) = (0.5, 9). In Figure |7l we plot ROC eurves for different learning 
iterations. For every learning iteration, we assume that Di = 10 and D = 20. It ean be seen from 
Figure |7] that within 4 learning iterations, deteetion performanee of the learning based weighted 
gain eombining seheme approaehes the deteetion performanee of weighted gain eombining with 
known optimal weight based seheme. 

Note that, the above learning based seheme ean be used in eonjunetion with the proposed 
weighted average eonsensus based algorithm to mitigate the effeet of Byzantines. 

VI. Conclusion and future work 

In this paper, we analyzed the seeurity performanee of eonventional eonsensus based algo¬ 
rithms in the presenee of data falsifieation attaeks. We showed that above a eertain fraetion of 
Byzantine attaekers in the network, existing eonsensus based deteetion algorithm are ineffeetive. 
Next, we proposed a robust distributed weighted average eonsensus algorithm and devised a 
learning teehnique to estimate the operating parameters (or weights) of the nodes. This enables 
an adaptive design of the loeal fusion or update rules to mitigate the effeet of data falsifieation 
attaeks. There are still many interesting questions that remain to be explored in the future 
work sueh as an analysis of the problem for time varying topologies. Note that, some analytieal 
methodologies used in this paper are eertainly exploitable for studying the attaeks in time varying 
topologies. Other questions sueh as the optimal topology whieh ineurs the fastest eonvergenee 
rate ean also be investigated. 
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